Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

نویسندگان

  • Ahuva Mu'alem
  • Dror G. Feitelson
چکیده

ÐScheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload: For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result: Backfilling actually works better when users overestimate the runtime by a substantial factor. Index TermsÐParallel job scheduling, backfilling, runtime estimates, workload modeling, performance metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Priorities and Improving Utilization of the IBM SP2 Scheduler Using Slack Based Backfilling

Running jobs on the IBM SP2, as in most distributed memory parallel system in the market today, is done by giving each job a subset of the available processors for its exclusive use. Scheduling jobs in FCFS order suffers from severe fragmentation that leads to utilization loss. This led Argonne National Lab, where the first large SP1 was installed, to develop the EASY scheduler, which has since...

متن کامل

Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

Scheduling jobs on the IBM SP2 system is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order that the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. An alternative is to use the EASY scheduler, which uses aggressive backfilling: small jobs are moved ah...

متن کامل

Backfilling Using Runtime Predictions Rather Than User Estimates

The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). To make such determinations possible, users are required to provide estimates of how long jo...

متن کامل

Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling

Distributed memory parallel systems such as the IBM SP2 execute jobs using variable partitioning. Scheduling jobs in FCFS order leads to severe fragmentation and utilization loss, which lead to the development of backfilling scheudlers such as EASY. This paper presents a backfilling scheduler that improves EAST in two ways: It supports both user selected and administrative priorities, and guara...

متن کامل

The Impact of Task Runtime Estimate Accuracy on Scheduling Workloads of Workflows

Workflow schedulers often rely on task runtime estimates when making scheduling decisions, and they usually target the scheduling of a single workflow or batches of workflows. In contrast, in this paper, we evaluate the impact of the absence or limited accuracy of task runtime estimates on slowdown when scheduling complete workloads of workflows that arrive over time. We study a total of seven ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2001